{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "aklJxkHBD5aR"
   },
   "source": [
    "# LAB 2:  AutoML Tables Babyweight Training.\n",
    "\n",
    "**Learning Objectives**\n",
    "\n",
    "1. Setup AutoML Tables\n",
    "1. Create and import AutoML Tables dataset from BigQuery\n",
    "1. Analyze AutoML Tables dataset\n",
    "1. Train AutoML Tables model\n",
    "1. Check evaluation metrics\n",
    "1. Deploy model\n",
    "1. Make batch predictions\n",
    "1. Make online predictions\n",
    "\n",
    "\n",
    "## Introduction \n",
    "In this notebook, we will use AutoML Tables to train a model to predict the weight of a baby before it is born.  We will use the AutoML Tables UI to create a training dataset from BigQuery and will then train, evaluate, and predict with a Auto ML Tables model.\n",
    "\n",
    "In this lab, we will setup AutoML Tables, create and import an AutoML Tables dataset from BigQuery, analyze AutoML Tables dataset, train an AutoML Tables model, check evaluation metrics of trained model, deploy trained model, and then finally make both batch and online predictions using the trained model.\n",
    "\n",
    "Each learning objective will correspond to a series of steps to complete in this student lab notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Verify tables exist\n",
    "\n",
    "Run the following cells to verify that we previously created the dataset and data tables. If not, go back to lab [1b_prepare_data_babyweight](../solutions/1b_prepare_data_babyweight.ipynb) to create them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bigquery\n",
    "-- LIMIT 0 is a free query; this allows us to check that the table exists.\n",
    "SELECT * FROM babyweight.babyweight_augmented_data\n",
    "LIMIT 0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lab Task #1: Setup AutoML Tables"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 1: Open AutoML Tables\n",
    "Go the GCP console and open the console menu in the upper left corner. Then scroll down to the bottom to get to the `Artificial Intelligence` section. Click on `Tables` to open AutoML Tables."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/1_automl_tables_hamburger_dropdown.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 2: Enable API\n",
    "If you haven't already enabled the AutoML Tables API, then you'll see the screen below. Make sure to click the `ENABLE API` button."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/2_automl_tables_enable_api.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 3: Get started\n",
    "If this is your first time using AutoML Tables, then you'll see the screen below. Make sure to click the `GET STARTED` button."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/3_automl_tables_get_started.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lab Task #2: Create and import AutoML Tables dataset from BigQuery"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 4: Datasets\n",
    "You should now be on AutoML Table's Datasets page. This is where all imported datasets are shown. We'll want to add our babyweight dataset, so click the `+ NEW DATASET` button."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/4_automl_tables_click_create_dataset.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 5: Create new dataset\n",
    "We need to give our new dataset a unique name. I named mine `babyweight_automl` but feel free to name yours whatever you want. When you are done choosing a unique name, click the `CREATE DATASET` button."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/5_automl_tables_create_new_dataset.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 6: Import your data\n",
    "Now that we've created a dataset, let's import our data so that AutoML Tables can use it for training. Our data is currently already in BigQuery, so we will select the radio button `Import data from BigQuery`. This will give us some text boxes to fill in with our data's `BigQuery Project ID`, `BigQuery Dataset ID`, and `BigQuery Table or View ID`. Once you are done entering those in, click the `IMPORT` button."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/6_automl_tables_import_data.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 7: Wait for your data to be imported\n",
    "AutoML Tables should now be importing your data from BigQuery. Depending on the size of your dataset, this could take a while, so this step is about just waiting and being patient."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/7_automl_tables_importing_data.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 8: Select target column\n",
    "Awesome! Our dataset has been successfully imported! You can now look at the dataset's schema which will show for each column the column name, the data type, and its nullability. Out of these columns we need to select which column is we want to be our target or label column. Click the drop down for `Target column` and choose `weight_pounds`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/8_automl_tables_schema_target_column.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 9: Approve target and schema\n",
    "When you successfully choose your target column you will see a green checkmark and the target tag added to the column row on the right. It will also disable its nullability since machine learning doesn't do too well with null labels."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/9_automl_tables_schema_continue.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lab Task #3: Analyze AutoML Tables dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 10: Analyze\n",
    "We can analyze some basic statistics here. We can see that we have 6 features, 4 of which are numeric and 2 of which are categorical. We can also see that there are 0% missing and 0 invalid values across all of our columns, which is great! We can also see the number of distinct values which we can compare with our expectations. Additionally, the linear correlation with the target column, `weight_pounds` in this instance, is shown as well as the mean and standard deviation for each column. Once you are satisfied with the analysis, then click the **TRAIN MODEL**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/10_automl_tables_analyze.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lab Task #4: Train AutoML Tables model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 11: Setup training\n",
    "We are almost ready to train our model. It took a lot of steps to get here but those were mainly to import the data and make sure the data is alright. As we all know, data is extremely important for ML and if it is not what we expect then our model will also not perfom as we expect. Garbage in, garbage out. We need to set the `Training budget` which is the maxmimum number of node hours to spend training our model. Thankfully, if improvement stops before that, then the training will stop and you'll only be charged for the actual node hours you used. For this dataset, I got decent results with a budget of just 1 to 3 node hours. We also need to select which features we want to use in our model out of the superset of features by selecting the `Input feature selection` dropdown where we will see details in the next step below. Once all of that is set the click the `TRAIN MODEL` button."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/11_automl_tables_train.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 12: Input feature selection\n",
    "We imported six columns, one of which, `weight_pounds`, we have set aside to be our target or label column. This leaves five columns leftover. Clicking the `Input feature selection` dropdown provides you with a list of all of the remaining columns. We want `is_male`, `mother_age`, `plurality`, and `gestation_weeks` as our four features. `hashmonth` is leftover from when we did our repeatable splitting in the [1b_prepare_data_babyweight](../solutions/1b_prepare_data_babyweight.ipynb) lab. Whatever is selected will be trained with, so please click the checkbox to de-select it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/12_automl_tables_feature_selection.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 13: Wait for model to train\n",
    "Woohoo! Our model is training! We are going to have an awesome model when it finishes! And now we wait. Depending on the size of your dataset, your training budget, and other factors, this could take a while, anywhere from a couple hours to over a day, so this step is about just waiting and being patient. A good thing to do while you are waiting is to keep going through the next labs in this series and then come back to this once lab training completes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/13_automl_tables_training.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lab Task #5: Check evaluation metrics"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 14: Evaluate model\n",
    "Yay! Our model is done training! Now we can check the `EVALUATE` tab and see how well we did. It reminds you what the target was, `weight_pounds`, what the training was optimized for, `RMSE`, and then many evaluation metrics like MAE, RMSE, etc. My training run did great with an RMSE of 1.030 after only an hour of training! It really shows you the amazing power of AutoML! Below you can see a feature importance bar chart. `gestation_weeks` is by far the most important which makes sense because usually the longer someone has been pregnant, the longer the baby has had time to grow, and therefore the heavier the baby weighs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/14_automl_tables_evaluate.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lab Task #6: Deploy model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 15: Deploy model for predictions\n",
    "So if you are satisified with how well our brand new AutoML Tables model trained and evaluated, then you'll probably want to do next what ML is all about; making great predictions! To do that, we'll have to deploy our trained model. If you go to the main `Models` page for AutoML Tables you'll see your trained model listed. It gives the model name, the dataset used, the problem type, the time of creation, the model size, and whether the model is deployed or not. Since we just finished training our model, `Deployed` should say `No`. Click the three vertical dots to the right and then click `Deploy model`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/15_automl_tables_deploy.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 16: Deploy model confirmation\n",
    "You should now see a confirmation box pop up on your screen. This is just a confirmation making sure you really want to deploy your model because then there will be charges depending on the model size and the number of machines used. Please click the `DEPLOY` button."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/16_automl_tables_deploy_confirmation.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lab Task # 7: Make batch predictions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 17: Create batch prediction job\n",
    "Great! Once it is done deploying, `Deployed` should say `Yes` and you can now click your model name and then the `TEST & USE` tab. You'll start out with batch prediction. To make these easy, we can for now just predict on the BigQuery table that we used to train and evaluate on. To do that, select the radio button `Data from BigQuery` and then enter your `BigQuery Project Id`, `BigQuery Dataset Id`, and `BigQuery Table or View Id`. We could have also used CSVs from Google Cloud Storage. Then we need to select where we want to put our `Result`. Let's select the radio button `BigQuery project` and then enter our `BigQuery Project Id`. We also could have written the results to Google Cloud Storage. Once all of that is set, please click `SEND BATCH PREDICTION` which will submit a batch prediction job using our trained AutoML Tables model and the data at the location we chose above."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/17_automl_tables_batch_predict.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 18: Batch prediction job finished\n",
    "After just a little bit of waiting, your batch predictions should be done. For me with my dataset it took just over 15 minutes. At the bottom of the `BATCH PREDICTION` page you should see a section labeled `Recent Predictions`. It shows the data input, where the results are stored, when it was created, and how long it took to process. Let's now move to the [BigQuery Console UI](https://console.cloud.google.com/bigquery) to have a look."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/18_automl_tables_batch_predict_results.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 19: Batch prediction dataset\n",
    "On your list of projects on the far left, you will see the project you have been working in. Click the arrow to expand the dropdown list of all of the BigQuery datasets within the project. You'll see a new dataset there which is the same as what was shown for the `Results directory` from the last step. Expanding that dataset dropdown list you will see two BigQuery tables that have been created: `predictions` and `errors`. Let's first look at the `predictions` table."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/19_automl_tables_batch_predict_dataset.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 20: Batch prediction predictions\n",
    "The `predictions` BigQuery table has essentially taken your input data to the batch prediction job and appended three new columns to it. Notice even columns that you did not use as features in your model are still here such as `hashmonth`. You should see the two `prediction_inteval` columns for `start` and `end`. The last column is the prediction `value` which for us is our predicted `weight_pounds` that was calculated by our trained AutoML Tables model uses the corresponding features in the row."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/20_automl_tables_batch_predict_prediction_table.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 21: Batch prediction errors\n",
    "We can also look at the `errors` table for any possible errors. When I ran my batch prediction job, thankfully I didn't have any errors, but this is definitely the place to check in case you did. Since my `errors` table was empty, below you'll see the schema. Once again it has essentially taken your input data to the batch prediction job and appended three new columns to it. There is a record stored as well as an error `code` and `error` message. These could be helpful in debugging any unwanted behavior."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/21_automl_tables_batch_predict_errors_table_schema.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lab Task #8: Make online predictions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 22: Online prediction setup\n",
    "We can also perform online prediction with our trained AutoML Tables model. To do that, in the `TEST & USE`` tab, click `ONLINE PREDICTION`. You'll see on your screen something similar to below with a table our model's features. Each feature has the column name, the column ID, the data type, the status (whether it is required or not), and a prepopulated value. You can leave those values as is or enter values. For `Categorical` features, make sure to use valid values or else they will just end up in the OOV (out of vocabulary) spill-over and not take full advantage of the training. When you're done setting your values, click the `PREDICT` button."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/22_automl_tables_before_online_predict.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 23: Online prediction result\n",
    "After just a moment, you should see your online predictions appear on your screen. There will be a `Prediction result` as well as a `95% prediction interval` returned. You can try other values for each feature and see what predictions they result in!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"./assets/23_automl_tables_after_online_predict.png\" width='70%'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lab Summary: \n",
    "In this lab, we setup AutoML Tables, created and imported an AutoML Tables dataset from BigQuery, analyzed AutoML Tables dataset, trained an AutoML Tables model, checked evaluation metrics of trained model, deployed trained model, and then finally made both batch and online predictions using the trained model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "PK_-WNGUD5bX"
   },
   "source": [
    "Copyright 2019 Google Inc. Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "default_view": {},
   "name": "babyweight_bqml.ipynb",
   "provenance": [],
   "version": "0.3.2",
   "views": {}
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
